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Abstract 

Many scientific data-intensive applications perform iterative 
computations on array data. There exist multiple engines 
specialized for array processing. These engines efficiently 
support various types of operations, but none includes na¬ 
tive support for iterative processing. In this paper, we de¬ 
velop a model for iterative array computations and a series 
of optimizations. We evaluate the benefits of an optimized, 
native support for iterative array processing on the SciDB 
engine and real workloads from the astronomy domain. 


1. INTRODUCTION 

Science is increasingly becoming data-driv en [11] . From 
small research labs to large communities [31[ |17| , scientists 
have access to more data than ever before. As a result, 
scientists can increasingly benefit from using database man¬ 


agement systems to organize and query their data |15| 30 


Scientific data often takes the form of multidimensional ar¬ 
rays {e.g., 2D images or 3D environment simulations). One 
approach to managing this type of array data is to build 
array libraries on top of relational engines, but many ar¬ 
gue that simulating arrays on top of relations can be highly 
inefficient [^. Scientists also need to perform a variety of op¬ 
erations on their array data such as feature extraction |l4| , 
smoothing |25| , and cross-matching 23 , which are not built- 
in operations in relational Database Management Systems 
(DBMSs). Those operations also impose different require¬ 
ments than relational operators on a DBMS [^. 

As a result, many data management systems are being 
built to support the array model natively |25[ |36| . Addi¬ 
tionally, to handle today’s large-scale datasets, several en¬ 
gines, including SciDB 25 , provide support for processing 


arrays in parallel in a shared-nothing cluster. Several bench¬ 
mark studies have shown that these specialized array en¬ 
gines outperform both relational engines and MapReduce- 
type systems on a variety of array workloads . 

Many data analysis tasks today require iterative process¬ 
ing 0: machine learning, model fitting, pattern discovery. 
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flow simulations, cluster extraction, and more. As a result, 
most modern Big Data management and analytics systems 
{e.g., [16[ |34| ) support iterative processing as a first-class 
citizen and offer a variety of optimizations for these types 
of computations: caching asynchronous processing [16| , 
prioritized processing |20|| , etc. 

The need for efficient iterative computation extends to 
analysis executed on multi-dimensional scientific arrays. For 
example, astronomers typically apply an iterative outlier- 
removal algorithm to telescope images as one of the first 
data processing steps. Once the telescope images have been 
cleaned, the next processing step is to extract sources (i.e., 
stars, galaxies, and other celestial structures) from these im¬ 
ages. The source extraction algorithm is most easily written 
as an iterative query as well. As a third example, the simple 
task of clustering data in a multi-dimensional array also re¬ 
quires iterating until convergence to the final set of clusters. 
We further describe these three applications in Section 

While it is possible to implement iterative array computa¬ 
tions by repeatedly invoking array queries from a script, this 
approach is highly inefficient (as we show in Figure 10(a) I. 
Instead, a large-scale array management systems such as 
SciDB should support iterative computations as first-class 
citizens in the same way other modern data management 
systems do for relational or graph data. 

Contributions: In this paper, we introduce a new model 
for expressing iterative queries over arrays. We develop a 
middleware system called ArrayLoop that we implement on 
top of SciDB to translate queries expressed in this model 
into queries that can be executed in SciDB. Importantly, 
ArrayLoop includes three optimizations that trigger rewrites 
to the iterative queries and ensure their efficient evaluation. 
The first optimization also includes extensions to the SciDB 
storage manager. More specifically, the contribution of this 
paper are as follows: 

(1) New model for iterative array processing (Sec¬ 
tions and 1^: Iterating over arrays is different from it¬ 
erating over relations. In the case of arrays, the iteration 
starts with an array and updates the cell values of that ar¬ 
ray. It does not generate new tuples as in a relational query. 
Additionally, these update operations typically operate on 
neighborhoods of cells. These two properties are the foun¬ 
dation of our new model for iterative array processing. Our 
model enables the declarative specification of iterative ar¬ 
ray computations, their automated optimization, and their 
efficient execution. 

(2) Incremental iterative processing (Section [^: In 
many iterative applications, the result of the computation 







changes only partly from one iteration to the next. As such, 
implementations that recompute the entire result every time 
are known to be inefficient. The optimization, called incre¬ 
mental iterative processing [^, involves processing only the 
part of the data that changes across iterations. When this 
optimization is applicable, it has been shown to significantly 
improve performance in relational and graph systems [8l|20|. 
This optimization also applies to array iterations. While it 
is possible to manually write a set of queries that process 
the data incrementally, doing so is tedious, error-prone, and 
can miss optimization opportunities. Our iterative array 
model enables the automatic generation of such incremen¬ 
tal computations from the user’s declarative specification of 
the overall iterative query. Additionally, while the idea of 
incremental iterations has previously been developed for re¬ 
lational systems, its implementation in an array engine is 
very different: For an array engine, the optimization can 
be pushed all the way to the storage manager with signif¬ 
icant performance benefits. We develop and evaluate such 
storage-manager-based approach to incremental array pro¬ 
cessing. 

(3) Overlap iterative processing (Section]^: In itera¬ 
tive array applications, including, for example, cluster find¬ 
ing and source detection, operations in the body of the loop 
update the value of the array cells by using the values of 
other neighboring array cells. These neighborhoods are of¬ 
ten bounded in size. These applications can effectively be 
processed in parallel if the system partitions an array but 
also replicates a small amount of overlap cells. In the case of 
iterative processing, the key challenge lies in keeping these 
overlap cells up to date. This optimization is specific to 
queries over arrays and does not apply to relational engines. 
Our key contribution here lies in new mechanisms for man¬ 
aging the efficient reshuffling of the overlap data across iter¬ 
ations. 

A subset of applications that leverage overlap data also 
have the property that overlap cells can be updated only ev¬ 
ery few iterations. Examples of such applications are those 
that try to find structures in the array data. They can And 
structures locally, and need to exchange information only 
periodically to stitch these local structures into larger ones. 
We extend our overlap data shuffling approach to leverage 
this property and further reduce the overhead of synchroniz¬ 
ing overlap data. We call this optimization, mini-iterations. 

(4) Multi-resolution iterative processing (Section [^: 
Finally, in many applications, the raw data lives in a contin¬ 
uous space (3D universe, 2D ocean, N-D space of continuous 
variables) and arrays capture discretized approximations of 
the real data. Different data resolutions are thus possible 
and scientifically meaningful to analyze. In fact, it is com¬ 
mon for scientists to look at the data at different levels of 
detail. In many applications, it is often efficient to first pro¬ 
cess the low-resolution versions of the data and use the result 
to speed-up the processing of finer-resolution versions of the 
data if requested by the user. Our final optimization au¬ 
tomates this approach. While scientists are accustomed to 
working with arrays at different levels or detail, our contri¬ 
bution is to show how this optimization can be automatically 
applied to iterative queries in an array engine. 

(5) Implementation and evaluation We implement the 
iterative model and all three optimizations as extensions to 
the open-source SciDB engine and we demonstrate their ef¬ 
fectiveness on experiments with 1 TB of publically-available 


synthetic LSST images [^. Experiments show that Incre¬ 
mental iterative processing can boost performance by a fac¬ 
tor of 4-6X compared to a non-incremental iterative com¬ 
putation. Iterative overlap proeessing together with mini¬ 
iteration processing can improve performance by 31% com¬ 
pare to SciDB’s current implementation of overlap process¬ 
ing. Finally, the multi-resolution optimization can cut run¬ 
times in half if an application can leverage this technique. 
Interestingly, these three optimizations are complementary 
and their benefits can be compounded. 

To the best of our knowledge, this paper is the first to de¬ 
sign, implement, and evaluate an approach for iterative pro¬ 
cessing in a parallel array data management system. Given 
that array engines have been shown to outperform a variety 
of other systems on array workloads and that iter¬ 

ative analytics are common on array data (as we discussed 
above), efficient support for iterative query processing in ar¬ 
ray engines is a critical component of the big data engine 
ecosystem. 

2. MOTIVATING APPLICATIONS 

We start by presenting three array-oriented, iterative ap¬ 
plications. We use these applications as examples through¬ 
out the paper and also in the evaluation. 

Example 2.1. Sigma-clipping and co-addition in 
LSST images (SigmaClip): The Large Synoptic Survey 
Telescope (LSST [^) is a large-scale, multi-organization 
initiative to build a new telescope and use it to continuously 
survey the visible sky. The LSST will generate tens of TB of 
telescope images every night. Before the telescope produces 
its first images, astronomers are testing their data analysis 
pipelines using realistic but simulated images. 

When analyzing telescope images, some sources (a 
“source” can be a galaxy, a star, etc.) are too faint to be 
detected in one image but can be detected by stacking mul¬ 
tiple images from the same location on the sky. The pixel 
value (flux value) summation over all images is called image 
co-addition. Figure shows a single image and the corre¬ 
sponding co-added image. Before the co-addition is applied, 
astronomers often run a “sigma-clipping” noise-reduction al¬ 
gorithm. The analysis in this case has two steps: (1) out¬ 
lier filtering with “sigma-clipping” and then (2) image co¬ 
addition. Listing shows the pseudocode for both steps. 
Sigma-clipping consists in grouping all pixels by their (x,y) 
coordinates. For each location, the algorithms computes the 
mean and standard deviation of the flux. It then sets to null 
all cell values that lie k standard deviations away from the 
mean. The algorithm iterates by re-computing the mean 
and standard deviation. The cleaning process terminates 
once no new cell values are filtered out. Throughout the 
paper, we refer to this application as SigmaClip. □ 

Example 2.2. Iterative source detection algorithm 
(SourceDetect): Once telescope images have been cleaned 
and co-added, the next step is typically to extract the actual 
sources from the images. 

The pseudocode for a simple source detection algorithm is 
shown in Listing]^ Each non-empty cell is initialized with 
a unique label and is considered to be a different object. At 
each iteration, each cell resets its label to the minimum label 
value across its neighbors. Two cells are neighbors if they 
are adjacent. This procedure continues until the algorithm 



(a) Single image (b) Co-added image 


Figure 1: Illustrative comparison of a single tele¬ 
scope image and its corresponding co-added image. 
Many faint objects become visible after co-addition. 


Listing 1 Pseudocode for SigmaClip application 

Input: Array A with pixels from x-y images over time. 
//Part 1: Iterative sigma-clipping 
While(some pixel changes in A) 

For each (x,y) location 

Compute mean/stddev of all pixel values at (x,y). 
Filter ciny pixel value that is k 
standard deviations away from the meein 
//Part 2: Image co-addition 

Sum all non-null pixel values grouped by x-y 


converges. We refer to this application as SourceDetect. 

□ 


Example 2.3. K-means clustering algorithm 
(KMeans): In many domains, clustering algorithms 

are commonly used to identify patterns in data. Their use 
extends to array data. We consider in particular K-means 
clustering on a 2D array 12 . K-means clustering works 


as follows: It assigns each cell randomly to one of the k 
clusters. It computes the centroid of each cluster. It iterates 
by re-assigning each cell to its nearest cluster. We refer to 
this application as KMeans. □ 


These applications illustrate two important properties of 
iterative computations over arrays. First, the goal of an it¬ 
erative computation is to take an array from an initial state 
to a final state by iteratively refining its content. The Sig¬ 
maClip application, for example, starts with an initial 3D 
array containing 2D images taken at different times. Each 
iteration changes the cell values in this array. The iteration 
terminates when no cell changes across two iterations. Sec¬ 
ond, the value of each cell at the next iteration is determined 
by the values of a group of cells with some common charac¬ 
teristics at the current iteration. Those characteristics are 
often mathematically described for any given cell in the ar¬ 
ray. For SigmaClip those are “all pixel values at the same 
(x,y) location”. Interestingly, unlike SigmaClip, where each 
group of cells at the same {x, y) location influences many 
cell-values at the next iteration, in the SourceDetect algo¬ 
rithm any given cell (*, y) is influenced by a unique group of 
cells, which are its adjacent neighbors. These groups of cells 
partially overlap with each other, which complicates parallel 
processing as we discuss in Section]^ 


3. ITERATIVE ARRAY-PROCESSING 
MODEL 

We start with a formal definition of an array similar to 
Furtado and Baumann [^: Given a discrete coordinate set 
S = Si X ... X Sd, where each Si,i € [1, d] is a finite totally 
ordered discrete set, an array is defined by a d-dimensional 
domain D = [Ii,..., Id], where each U is a subinterval of the 


Listing 2 Pseudocode for SourceDetect application 

Input: Co-added Array A with uniquely labeled pixels from 
all the x-y images. 

Input: int r, the adjacency threshold. 

While(some pixel changes in A) 

For each (x,y) location 

Compute the minimum label of all pixel values (x’,y’) 
with x-r <= x^<= x+r and y-r <= y’<= y+r. 



Figure 2: Iterative array A and its state at each 
iteration for the SourceDetect application. {Qfeiis(A) • 
Vcij G cells{A) i G h j G h } where h = h = 
{1,2, 3,4} are the sets of dimension values, applies 
mini) aggregate on each group of cells, 5^ simply 
stores the aggregated value in each cell Cij, and tt : 
ix,y) —>■ [a: ± l][i/ ± 1]. At each iteration, a sliding 
window scans through all the cells. 

corresponding Si. Each combination of dimension values in 
D defines a cell. All cells in a given array A have the same 
type T, which is a tuple. cells{A) is the set of all the cells 
in array A and function V : cells{A) —> T maps each cell in 
array A to its corresponding tuple with type T. In the rest 
of the paper, we refer to the dimension x in array A as A[x\ 
and to each attribute y in the array A as A.y. 

In SciDB, users operate on arrays by issuing declarative 
queries using either the Array Query Language (AQL) or the 
Array Functional Language (AFL). The select statements 
in Algorithmj^in Sectionj^are examples AQL queries. AQL 
and AFL queries are translated into query plans in the form 
of trees of array operators. Each operator O takes one or 
more arrays as input and outputs an array: O : A —>■ A or 
O : A X A ->• A. 

In an iterative computation, the goal is to start with an 
initial array A and transform it through a series of opera¬ 
tions in an iterative fashion until a termination condition 
is satisfied. The iterative computation on A typically in¬ 
volves other arrays, including arrays that capture various 
intermediate results [e.g., arrays containing the average and 
standard deviation for each (x, y) location in the SigmaClip 
application) and arrays with constant values [e.g., a connec¬ 
tivity matrix in a graph application). 

One can use the basic array model to express iterative 
computations. The body of the loop can simply take the 
form of a series of AQL or AFL queries. Similarly, the termi¬ 
nation condition can be an AQL or AFL query. In Sectionj^ 
the first function in Algorithm illustrates this approach. 

To enable optimizations, however, we extend the basic 
array model with constructs that capture in greater details 
how iterative applications process arrays. We start with 
some definitions. 

Definition 3.1. We call an array iterative if its cell-values 
are updated during the course of an iterative computation. 
The array starts with an initial state Aq. As the itera¬ 
tive computation progresses, the array goes through a set 
of states Ai, A 2 , A 3 , ..., until a final state An. Note that 
all Ai have the same schema. In other words, the shape of 



















































an iterative array does not change. 

Figure shows a (4x4) iterative array that represents a 
tiny telescope image in the SourceDetect application. In 
the initial state, Aq, each pixel with a flux value above a 
threshold is assigned a unique value. As the iterative com¬ 
putation progresses, adjacent pixels are re-labeled as they 
are found to belong to the same source. In the final state 
As, each set of pixels with the same label corresponds to one 
detected source. 

Iterative applications typically define a termination con¬ 
dition that examines the cell-values of the iterative array: 

Definition 3.2. An iterative array A has converged, when¬ 
ever T{Ai, Ai+\) < e for some aggregate function T. T is 
the termination condition, e is a user-specified constant. 

In Figure]^ convergence occurs at iteration 3 when e = 0 
and the termination condition T is the count of differences 
between At and Ai+\. Our ArrayLoop system represents T 
as AQL function. 

An iterative array computation takes an iterative array, 
A, and applies to it a computation Q until convergence: 

Ao ^ Ai ^ ... ^ Ai ^ Ai+i ( 1 ) 

where Q is a sequence of valid AQL or AFL queries. At 
each step, Q can either update the entire array or only some 
subset of the array. We capture the distinction with the 
notion of major and minor iteration steps: 

Definition 3.3. A state transition, Ai Ai+i, is a major 
step if the function Q operates on all the cells in A at the 
same time. Otherwise it is a minor step. 

The array state Aij represents the state of the iterative 
array after i major steps followed by j minor steps. We 
are interested in modeling computations where each major 
step can be decomposed into a set of minor steps that can 
be evaluated in parallel. That is, a major step Qi can be 
expressed as a set of minor steps qi such that a, Qi = gi,ai ■ 

f}i,a2 ■ ■ • ' Qi.cjn ■ 

The iterative array computation in Equation includes 
{i -I- 1) major steps. The first line illustrates the transition 
of iterative array A in major steps and the second line illus¬ 
trates the possible minor steps that can replace the major 
step Qi+i. A termination condition check always occurs be¬ 
tween two states of an iterative array after a major step. 

Ao ^ Ai ^ ...A,_i ^ Ai A.+i (2) 


Figure]^ shows an iterative array computation with only 
major steps involved, while Figure [^presents the same ap¬ 
plication but executed with minor steps. 

We further observe from the example applications in Sec¬ 
tion that the functions Q often follow a similar pattern. 

First, the value of each cell in iterative array Ai+i that 
is updated by Q only depends on values in nearby cells in 
array Ai. We capture this spatial constraint with a function 
TT that specifies the mapping from output cells back onto 
input cells: 



Figure 3: Iterative array A and its state after three 
minor steps, each of the form: Qij = QI. where 
Ci,j is the cell at A[i][j], applies mini) aggregate, 5 
simply stores the aggregate result as the new value 
in cell Cij , and tt : {x,y) —>■ [a: ± 1 ] [j/ ± 1 ] 

Definition 3.4. tt is an assignment function defined as tt : 
cells{A) —^ V (ceZZs(A)), where cells{A) is the set of all the 
cells in array A and VQ is the powerset function. 

Figure [^illustrates two examples of assignment functions. 
Our ArrayLoop system supports two types of assignment 
functions: windowed functions such as those illustrated in 
Figure [^ and attribute assignment function. The latter oc¬ 
cur in applications such as K-means clustering described in 
Example |2.3| tt : (x, y) —>■ label where all the cells with the 
same label are grouped together. 

Definition 3.5. is an aggregate function defined as : 
cells{A) —>■ r. groups the cells of the array A according 
to assignment function tt, with one group of cells per cell 
in the array A. It then computes the aggregate functions 
separately for each group. The aggregate result is stored in 
tuple T. 

Finally, Q updates the output array with the computed 
aggregate values: 

Definition 3.6. 5^ : {cells{A), f") —>■ cellsiA) is a cell- 
update function. It updates each cell of the array A with 
the corresponding tuple r computed by and the current 
value of the cell itself. 

These three pieces together define the iterative array com¬ 
putation Qq as follows: 

Definition 3.7. An iterative array computation Qq on 
the subset of cells C where C € P(cells(A)) generates subset 
of cells C € Vicells{A)) such that 'ic £ C and c £ C 
c = S^{c, f^ic)) where c and c are two corresponding cells 
in those subsets. 

In the example from Figures and which illustrate the 
SourceDetect application, the goal is to detect all the clus¬ 
ters in the array A, where each cell pi = {xi, yi) in a cluster 
has at least one neighbor ps = {xs , 1 / 2 ) in the same cluster 
such that |xi — * 2 ! < 1 and \y\ — y2\ < 1, if it is not a 
single-cell cluster. In this application, tt is the 3X3 window 
around a cell. We slide the window over the array cells in 
major order. At each minor step, at each cell dj at the 
center of the window, we apply an iterative array computa¬ 
tion Qij = QI. ’) where applies a mini) aggregate over 
the 3x3 window, tt, and 5^ is a cell-update function that 
simply stores the result of the mini) aggregate into cell Ci^j. 
Figure [^illustrates three steps of this computation. Notice 
that the output of the iterative array computation Qo,q be¬ 
comes the input for Qo.i and so on. Another strategy is to 
have many windows grouped and applied together. In other 
















































(a) SourceDe- (b) SigmaClip 
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Figure 4: Two examples of window assignment func¬ 
tions: (a) TTi : {x,y,z) —>■ [a; ± l][i/± l][z ± 1], the associ¬ 
ated window is highlighted for the cell at (2,1,2). (b) 
7r2 : {x,y,z) — > the associated window is high¬ 

lighted for all the cells at (x, y, z) with z = 0. 

words, instead of applying the iterative array computation 
per cell, we apply on a group of cells C € V {cells{A)) 

in one major step. Note that when using minor steps, the 
output of each minor step serves as input to the next step. In 
contrast, when using major steps, the iterative array compu¬ 
tations see the original array state at the beginning of that 
iteration. Figurej^shows the iterative array computation for 
the latter strategy. The former strategy has less expensive 
steps than the latter strategy, but it requires more steps to 
converge. 

In our model, we encapsulate all the elements of the model 
in a FixPoint operator: 

FixPoint{A, tt, f, S, T, e) (3) 

With our model, the user specifies the logic of the iterative 
algorithm without worrying about the way it is going to be 
executed. Our model can be implemented and executed on 
top of various array execution engines. In the rest of the 
paper, we describe how the queries specified in our model are 
rewritten and efficiently run in the SclDB array engine. The 
execution strategy in SciDB uses only major steps. Mini¬ 
step iterations, i.e. asynchronous execution, is left for future 
study. 

4. ITERATIVE ARRAY PROCESSING 

We extend SciDB-Py with a python FixPointi) oper¬ 
ator following the model from Section]^ We also develop an 
optimizer module that we name ArrayLoop. The user encap¬ 
sulates its iterative algorithm in the FixPointi) operator. 
The ArrayLoop optimizer sits on top of SciDB. ArrayLoop 
rewrites a FixPointiAjiv, f,S,T,e) operator into the AQL 
queries in Listing[^that it wraps with an internal while loop. 

is_window helper function in Listing clarifies whether 
the window assignment function translates to a window ag¬ 
gregate or a group-by aggregate. ArrayLoop translates a 
window assignment function to a group-by aggregate if map¬ 
ping is from a set of input dimensions to one of its subsets. 
If mapping is from a set of dimensions to the same set of 
dimensions with additional offsets per dimension, then Ar¬ 
rayLoop translates it to window-aggregate. Supporting win¬ 
dow assignment function that is a combination of group-by 
aggregate and window aggregate is left for future work. 

In addition, ArrayLoop also implements a set of query re¬ 
writing tasks in order to leverage a series of optimizations 
that we develop: incremental iterative processing, overlap it¬ 
erative processing, and multi-resolution iterative processing. 

ArrayLoop acts as a pre-processing module before exe- 


Listing 3 Pseudocode for rewriting FixPoint operator 

Input: 

FixPoint(A,pi,f,delta,!,epsilon) 

Output: 

While (T(A,A_prev) < epsilon) 

// Termination function T is also AQL function. 

// Compute the new aggregates from the current iterative array. 
If (is_window(pi)) 

G = SELECT f FROM A WINDOW PARTITIONED BY pi 
else 

G = SELECT f FROM A GROUP BY pi 
// Combine the new aggregate with the old value of the cell. 

S = SELECT * FROM G JOIN A ON <matching dimensions> 

A_new = SELECT deltaCS) FROM S 
A_prev = A 
A = A new 



(a) (b) (c) (d) 


Figure 5: Snapshots from the first 3 iterations 
of the SigmaClip application with the incremental- 
processing optimization on the LSST dataset. 
Green-colored points are the ones that change across 
iterations, (a) Original Image (b) Iteration-1 (c) 
Iteration-2 (d) Iteration-3. 

cuting the iterative query in SciDB. Currently the majority 
of the ArrayLoop implementation is outside the core SciDB 
engine. As future work, we are planning to push the Array- 
Loop python prototype into the core SciDB engine. Array- 
Loop relies on SciDB for features such as distributed query 
processing, fault-tolerance, data distribution, and load bal¬ 
ancing. In the following sections, we describe each of the 
three optimizations in more detail. 

5. INCREMENTAL ITERATIONS 

In a wide range of iterative algorithms, the output at each 
iteration differs only partly from the output at the previ¬ 
ous iteration. Performance can thus significantly improve 
if the system computes, at each iteration, only the part of 
the output that changes rather than re-computing the entire 
result every time. This optimization called incremental it¬ 
erative processing is well-known, e.g. in semi-naive data- 
log evaluation, and has been shown to significantly improve 
performance in relational and graph systems. ArrayLoop 
leverages the iterative computation model from Section 
to automatically apply this optimization when the seman¬ 
tics of the applications permit it. The SigmaClip applica¬ 
tion described in Section [2.1| is an example application that 
can benefit from incremental iterative processing. Figure 
shows multiple snapshots of running the sigma-clipping al¬ 
gorithm using incremental iterative processing, on a subset 
of the Isst dataset. Green-colored points are the ones with 
changed values across two consecutive iterations. As the it¬ 
erative computation proceeds, the number of green-colored 
points drops dramatically and consequently the amount of 
required computation at that step. 

sigma-clippingO and incr-sigma-clippingO modules 
in Algorithm show the original implementation and the 
manually-written incremental version of the implementa¬ 
tion, respectively. In the sigma-clipping() module, the 
avgO and stdvO aggregate operators are computed over 
the whole input at each iteration, which is inefficient. In 









































Algorithm 1 SigmaClip application followed by image co¬ 
addition 

1. function sigma-clipping(A,fe) > Nai've sigma-clip 

2. Input: Iterative Array A <float d>[x,y,t] 

3- Input: k a constant parameter. 

4. while (some pixels A[x, y, t] are filtered) do 

5- T\x. y\ = select avg(^d') as y,, stdv(^d) as <7 from A group by x, y 

6. S[x, y, t\ = select * from T join A on T.x ~ A.x and T.y = A.y 

7. A[x, y, t] = select d from S where y — kXcr‘^d^y-\-kXcT 

8. end while 

9. end function 


Algorithm 2 ArrayLoop version of the SigmaClip application 
followed by image co-addition 

1. function ArrayLoop-signia-clipping(A,fc) > SigmaClip algorithm with 

FixPoint operator provided by the user. 

2. Input: Iterative Array A <float d>[x,y,t], 

3- Input: k a constant parameter. 

4. ,r : WMW ^ [x][y]. 

5- 5 : “A.d > p — k X (7 and A.d < ^ -j- fc X C7?A : null” 

6. / : {auy{) as y, stdvQ as a} 

7. FixPoint(A, tt, /, <5, count{), 0) 

8. end function 


10. function incr-sigrna-clipping{A,fe) 

11. Input; Array A <float d>[x,y,t]. 

12. Input: k: a constant parameter, 
cal: Array C <int c,float s,float s^>[x,y]. 


> Incremental sigma-clip 


14. 

15. 


zal; Collect • 
Local; Re. 

AA •(- A 


• 4 > 


> Collects all the filtered points, 
> Keeps track of remaining points. 


9. function ArrayLoop-incr-sigma-clipping(A,fc) > ArrayLoop incremental 
rewriting of the SigmaClip. 

10. Input: Iterative Array A <float d>[x,y,t], 

11. Input; fc; a constant parameter. 

12. Local; Iterative Array C <int c,float s,float s^>[x,y], 

13. Local: Array S <float cr,float /j,>[x,y], 

14. AA~ ■«- A 


while (AA i: 

s not empty) do 





Ti [x,y] 
by X, y 

select count(d) as c, sum(<i) as s, sum(d^) i 

IS from AA group 

16. 

T[x, y] ■<— select count(d) as c, sum((i) as s, sum(d^) c 
hy X, y 

IS s^ from AA group 


19. 

if (first iteration) then 



20. 


C ^ Ti[x, y] 



21. 

else 



22. 


AC[x,y] -t- select C.c - T^.c as c , C.s - T^.s a 
from C join on . x = C.x Ti.y = C.y 

B B , C.A - Ti.s^ a 

s s^ 


24. 

y] *— select 0" ® AS y, AS cr from 

AC 


25. 

S[x,y,t\ select A.d, T.y, T.cr from T join Ren 

A.x and T.y~A.y 

lain o 

n T.x = 

26. 

AA •<— select d from S where d <- y — fc X cr or d '> y k 

X cr 





end while 


> Updates Remain. 
> Adds the filtered points to Collect. 
> Produces the final state for A. 


18. 

19. 

20 . 


if (first iteration) then 
C -f- T[x. y] 
e lse 

I merge(C,T,C.c - T.c) | 

I merge(C,T,C.s - T.17~| 

merge(C,T,C.a2 - T.a^) 

end if 

■Sfx, y] ■<— select ^ AS y. 


- AS o- FROM A + C 


erge(A, 5, S.y — fc X S.a < A.d < S.y -|- fc X S'.o’?A ; • 


25. 

26. end while 

27. end function 
co-addition phase: 

28. R[x, y] ■<— select sum(A.d) i 


all) 


add from A group by : 


31. end function 
co-addition phase: 

32. i?[x, y] <— select sum(A.d) as coadd from A group by x, y 


incr-sigma-clippingO, the user rewrites the avgO and 
stdvO aggregate operators in terms of two other aggre¬ 
gate operators count () and sumO (Algorithm Lines 18 
and |24[ ). The user also needs to carefully merge the current 
partial aggregates with the aggregate result of the previous 
iteration (Algorithm Line |22[ ). As shown in Algorithm]^ 
writing an efficient incremental implementation is not a triv¬ 
ial task. It is painful for users if they need to rewrite their 
algorithms to compute these increments and manage them 
during the computation. Ideally, the user wants to define the 
semantics of the algorithm and the system should automat¬ 
ically generate an optimized, incremental implementation. 
Additionally, as we show in the evaluation, if the system is 
aware of the incremental processing, it can further optimize 
the implementation by pushing certain optimizations all the 
way to the storage layer. 


5.1 Rewrite for Incremental Processing 

In ArrayLoop, we show how the incremental processing 
optimization can be applied to arrays. As shown in Al¬ 
gorithmic with ArrayLoop, the user provides a FixPoint 
operator in ArrayLoop-sigma-clipping function. Array- 
Loop automatically expands and rewrites the operation into 
an incremental implementation as shown in the ArrayLoop- 
incr-sigma-clipping function. The rewrite proceeds as 
follows. If the aggregate function / is incremental, Array- 
Loop replaces the initial aggregation with one over Delta A- 
or Delta A+ or both. For example, for ArrayLoop-incr- 
sigma-clipping, only negative delta arrays are generated 
at each iteration(there is no AA^). So the rewrite pro¬ 
duces a group-by aggregate only on AA~ (line 16). Next, 


ArrayLoop merges the partial aggregate values with the ag¬ 
gregate results from the previous iteration (lines 20 through 
22). The aggregate rewrite rules define that merge logic for 
all the aggregate functions. In this example, ArrayLoop will 
generate one merge statement per aggregate function com¬ 
puted earlier. Finally, on Line 24, ArrayLoop does the final 
computation to generate the final aggregate values for this 
iteration. Note that finalize phase in the aggregate compu¬ 
tation is always done on positive delta arrays (AC^), which 
generates the same result as computing on negative delta 
array AC~ followed by a subtract merge plus computing 
on positive delta array AC^ followed by an addition merge. 
Line 25 leverages the <5 function to generate the AA~ of the 
next iteration. 

Currently the decision whether the application seman¬ 
tics permit incremental iterative processing is left to the 
user. Given the FixPoint operator, ArrayLoop performs 
two tasks: (1) it automatically rewrites aggregate functions, 
if possible, into incremental ones and (2) it efficiently com¬ 
putes the last state of the iterative array using the updated 
cells at each iteration. The automatic rewrite is enabled by 
the precise model for iterative computations in the form of 
the three functions tt, /, and 5. Given this precise specifica¬ 
tion of the loop body, ArrayLoop rewrites the computation 
using a set of rules that specify how to replace aggregates 
with their incremental counter-parts when possible. To ef¬ 
ficiently compute incremental state updates, we introduce a 
special merge operator. We now describe both components 
of the approach. 

(1) Automatic aggregate rewrite: ArrayLoop trig¬ 
gers the incremental iterative processing optimization if any 
aggregate function in the FixPoint operator is flagged as 
incremental. The Data cube paper defines an aggre¬ 
gate function F{) as algebraic if there is an M-tuple valued 
function G{) and a function H{) such that: FdAij}) = 
H{{G{{Xij}\i = l,...,/})|j = 1,...,J}). ArrayLoop 




































stores a triple {agg, {Gi,..., Gk}, H) for any algebraic func¬ 
tion in the system and rewrites the aggregate query in 
terms of G() and HQ) functions during the query rewrit¬ 
ing phase. For example, ArrayLoop records the triple 
(avgO ,-[suin(), count ()]■,sum/count) and rewrites the al¬ 
gebraic average function avgO using the combination of 
sumO and count () to leverage incremental iterative pro¬ 
cessing. 

(2) Incremental state management: ArrayLoop pro¬ 
vides an efficient method for managing array state and incre¬ 
mental array updates during the course of an iterative com¬ 
putation. We observe that, during incremental processing, a 
common operation is to merge the data in two arrays, which 
do not necessarily have the same number of dimensions. In 
our example application, merging happens when the partial 
aggregates are combined with the aggregate result of the 
previous iteration, line |22| in incr-sigma-clippingO func¬ 
tion. This operation merges together two 2D arrays where 
the merge logic is inferred from the incremental aggregate 
function /. Such merging also happens when the results 
of the aggregate function are used to update the iterative 
array, lines |25| and |26| in incr-sigma-clippingO function. 
In this case, the application merges the data in a 2D array 
with the data in a 3D array by sliding or extruding the 2D 
array through the 3D array. The 5 cell-update function de¬ 
fines the logic of the merge operation in this case. The tt 
assignment function pairs-up cells from the intermediate ag¬ 
gregation array and the iterative array that merge together 
and thus determines whether merging will occur between 
arrays with the same number of dimensions or not. 

In the manual implementation, shown in the incr-sigma- 
clippingO function, the user implements the merge logic 
manually using join and filter queries, which is inefficient[^ 
To remove this inefficiency, given the FixPoint operator, Ar¬ 
rayLoop automatically generates queries with explicit merge 
points that leverage a new merge operator that we add to 
SciDB: merge(Array source, Array extrusion, Expres¬ 
sion exp). 

The new merge operator is unique in a sense that it not 
only specifies the merge logic between two cells via a math¬ 
ematical expression, exp, but it also automatically figures 
out which pairs of cells from the two arrays merge together 
by examining their common dimensions. ArrayLoop merges 
two cells from the source array and extrusion array if they 
share the same dimension-values in those dimensions that 
match in dimension-name. One cell in the extrusion array 
can thus merge with many cells in the source array. Figure [6 
illustrates the merge operator for queries in lines |25| and ^6 
in Algorithm As the figure shows, the extrusion array 
slides through the source array as the merging proceeds. 

5.2 Pushing Incremental Computation into 
the Storage Manager 

We observe that increments between iterations translate 
into updates to array cells and can thus be captured with 
two auxiliary arrays: a positive delta array and a negative 
delta array. At each iteration, the positive delta array AA^ 

^From an engineering point of view, the new merge oper¬ 
ator, unlike a join, can also leverage vectorization where 
instead of merging one pair of matching cells at a time, Ar¬ 
rayLoop merges group of matching cells together, potentially 
improving query runtime, especially when the number of di¬ 
mensions in the two input arrays is different. 


records the new values of updated cells and the negative 
delta array AA~ keeps track of the old values of updated 
cells. Delta arrays can automatically be computed by the 
system directly at the storage manager level. 

As a further optimization, we extend the SciDB storage 
manager to manage simple merge operations such as ad¬ 
dition/subtraction found in Lines 20 21 and 22 of Algo¬ 
rithm ArrayLoop uses naming conventions as a hint to 
the storage manager about the semantics of the merge op¬ 
eration. For example A(_) B, asks the storage manager 
to subtract array B from array A and store the result of the 
(A — B) operation as the new version of array A. In case 
array A is iterative, the new values and the old values of 
updated cells are stored in A'^A and A~A, respectively. 

Typically, the user need not worry about these annota¬ 
tions and processing details since ArrayLoop automatically 
generates the appropriate queries from the FixPoint specifi¬ 
cation. However, the user can leverage these optimizations 
manually as well. For example, the queries in the incr- 
sigma-clippingO function at Lines [27{|28[ and|30| (queries 
with red box frames) can all be pushed into the storage 
manager. 

To achieve high performance, the storage manager keeps 
chunks of the result array A together on disk with the corre¬ 
sponding chunks from the auxiliary A A"*" and AA“ arrays. 
As we showed in previous work on array versioning , the 
space overhead of delta arrays taken between similar array 
versions is typically insignificant compared with the size of 
the original array. 

We extend the ScanO and StoreO operators to read and 
write partial arrays AA^ and AA“, respectively. With 
those optimizations, the user does not need to explicitly 
write a user-defined diffO function or, as shown in the 
incr-sigma-clippingO example, a sequence of joinO and 
filterO queries in order to extract delta arrays from the 
output of the last iteration. 

In prior work [^, we demonstrated a prototype imple¬ 
mentation of the SigmaClip application together with the 
incremental iterative processing optimizations in the con¬ 
text of the AscotDB system that we built. AscotDB results 
from the integration of ASCOT, a Web-based tool for the 
collaborative analysis of telescope images and their meta¬ 
data, and SciDB, a parallel array processing engine. The 
focus of the demonstration was on this system integration 
and on the impact of the optimizations on the application. 
AscotDB shows that average users who use graphical inter¬ 
faces to specify their analysis also benefit from optimized 
iterative processing provided by lower layers of the analysis 
stack. 


6. ITERATIVE OVERLAP PROCESSING 

To process a query over a large-scale array in parallel, 
SciDB (and other engines) break arrays into sub-arrays 
called chunks, distribute chunks to different compute nodes 
(each node receives multiple chunks), and process chunks in 
parallel at these nodes. For many operations, such as fil¬ 
ter for example, one can process chunks independently of 
each other and can union the result. This simple strategy, 
however, does not work for many scientific array operations. 
Frequently, the value of each output array cell is based on 
a neighborhood of input array cells. Data clustering is one 
example. Clusters can be arbitrarily large and can go across 
array chunk boundaries. A common approach to computing 
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Figure 6: merge(A, T, T./i — k x T.a < A.d < T.fj, + k x 
T.alA : null) in SigmaClip application. This is the 
core filtering step where the outliers are removed. 
The 3D source array A <float d>{x,y,t\ and the 2D 
extrusion array (highlighted) T <float /i, float (j>{x,y\ 
share the first two dimensions, (a), (b), (c), and (d) 
show how the cells in the extrusion array slide into 
the source array at runtime. 


Array S 



Figure 7: Illustration of the mini-iteration optimiza¬ 
tion. Am represents local changes at iteration i of 
mini-iteration m and A* is the global change at iter¬ 
ation i. 


such operations in parallel is to perform them in two steps: 
a local, parallel step followed by an aggregate-type post¬ 
processing step [13[ |14[ |18| that merges partial results into a 
final output. For the clustering example, the first step finds 
clusters in each chunk. The second step combines clusters 
that cross chunk boundaries [^. Such a post-processing 
phase, however, can add significant overhead. To avoid a 
post-processing phase, some have suggested to extract, for 
each array chunk, an overlap area e from neighboring chunks. 


store the overlap together with the original chunk 25 28 


and provide both the core data and overlap data to the op¬ 
erator during processing . This technique is called overlap 
processing. Figure shows an example of array chunks with 
overlap. We refer the interested reader to our ArrayStore 
paper for a more detailed discussion of efficient overlap 
processing techniques. These techniques, however, do not 
address the question of how best to update the overlap data 
during an iterative computation. Our contribution in this 
paper is to tackle this specific question. 


6.1 Efficient Overlap Processing 

Array applications that can benefit from overlap process¬ 
ing techniques are those that update the value of certain ar¬ 
ray cells by using the values of neighboring array cells. The 
SourceDetect application described in Section [2.2| is an ex¬ 
ample application that can benefit form overlap processing. 
Other example applications include “oceanography particle 
tracking”, which follow a set of particles as they move in a 
2D or 3D grid. A velocity vector is associated with each cell 
in the grid and the goal is to find a set of trajectories, one 
for each particle in the array. Particles cannot move more 
than a certain maximum distance (depending on the maxi¬ 
mum velocity of particles) at each step. These applications 
can be effectively processed in parallel by leveraging overlap 
processing techniques. 

The challenge, however, is to keep replicated overlap cells 
up-to-date as their values change across iterations. To effi¬ 
ciently update overlap array cells, we leverage SciDB’s bulk 
data-shuffling operators as follows: SciDB’s operator frame¬ 
work implements a bool requiresRepart () function that 
helps the optimizer to decide whether the input array re¬ 
quires repartitioning before the operator actually executes. 
The partitioning strategy is determined by the operator se¬ 
mantics. For example, WindowAggregate operator in 
SciDB requires repartitioning with overlap in case the input 
array is not already partitioned in that manner. We extend 
the SciDB operator interface such that ArrayLoop can dy¬ 
namically set the returned value of the operator’s requires¬ 


Repart () function. To update overlap data, ArrayLoop sets 
the requiresRepart () return value to true. ArrayLoop has 
the flexiblity to set the value to true either at each itera¬ 
tion or every few iterations as we discuss further below. In 
case an operator in SciDB is guided by ArrayLoop to re¬ 
quest repartitioning, the SciDB optimizer injects the Scat¬ 
ter/Gather operators to shuffle the data in the input 
iterative array before the operator executes. 

A benefit of using SciDB’s existing Scatter/Gather op¬ 
erators, is that they shuffle array data one chunk (i.e., sub¬ 
array) at a time. Chunk-based data shuffling is faster com¬ 
pared with the method that shuffles overlap data one cell 
at a time. The downside of using SciDB’s Scatter/Gather 
general operators is the relative higher cost of data shuffling 
when only a few overlap cells have changed. 

6.2 Mini-Iteration Processing 

Keeping overlap cells updated at each iteration requires 
reading data from disk, shuffling it across the network, and 
writing it to disk. These are all expensive operations. Any 
reduction in the number of such data synchronization steps 
can yield significant performance improvements. 

We observe that a large subset of iterative applications 
have the property that overlap cells can be updated only ev¬ 
ery few iterations. These are applications, for example, that 
try to find structures in the array data, e.g. SourceDetect 
application. These applications can find structures locally 
and eventually need to exchange information to stitch these 
local structures into larger ones. For those applications, Ar¬ 
rayLoop can add the following additional optimization: Ar¬ 
rayLoop runs the algorithm for multiple iterations without 
updating the replicas of overlap cells. The application iter¬ 
ates over chunks locally and independently of other chunks. 
Every few iterations, ArrayLoop triggers the update of over¬ 
lap cells, and continues with another set of local iterations. 
The key idea behind this approach is to avoid data move¬ 
ment across array chunks unless a large enough amount of 
change justifies the cost. 

We call each series of local iterations without overlap cell 
synchronization mini iterations. Figure illustrates the 
schematic of the mini iteration optimization. 

An alternative approach is for the scheduler to delegate 
the decision to shuffle overlap data to individual chunks, 
rather than making the decision array-global as we do in this 
paper. We leave this extra optimization for future work. 

ArrayLoop includes a system-configurable function 
SIGNAL-OPT() that takes as input an iteration number and 
a delta iterative array, which represents the changes in the 








































































Figure 8: Control flow diagram for mini-iteration- 
based processing in ArrayLoop. 

last iteration. This function is called at the beginning of 
each iteration. The output of this function defines if the 
overlap data at the current iteration needs to be shuffled. A 
control flow diagram of this procedure is shown in Figure]^ 
There exists an optimization opportunity to exploit: Do we 
exchange overlap cells every iteration? Or do we wait until 
local convergence? Or something in between these two ex¬ 
tremes? We further examine those optimization questions 
in Section [8] 

7. MULTI-RESOLUTION OPTIMIZATION 

In many scientific applications, raw data lives in a con¬ 
tinuous space (3D universe, 2D ocean, N-D space of contin¬ 
uous variables). Scientists often perform continuous mea¬ 
surements over the raw data and then store a discretized 
approximation of the real data in arrays. In these scenarios, 
different levels of granularity for arrays are possible and sci¬ 
entifically meaningful to analyze. In fact, it is common for 
scientists to look at the data at different levels of detail. 

As discussed earlier, many algorithms search for struc¬ 
ture in array data. One example is the extraction of celes¬ 
tial objects from telescope images, snow cover regions from 
satellite images, or clusters from an N-D dataset. In these 
algorithms, it is often efficient to first identify the outlines 
of the structures on a low-resolution array, and then refine 
the details on high-resolution arrays. We call this array- 
specific optimization multi-resolution optimization. This 
multi-resolution optimization is a form of prioritized pro¬ 
cessing. By first processing a low-resolution approximation 
of the data, we focus on identifying and approximating the 
overall shape of the structures. Further processing of higher- 
resolution arrays helps extract the more detailed outlines of 
these structures. 

In the rest of this section we describe how ArrayLoop 
automates this optimization for iterative computations in 
SciDB. We use the KMeans application described in Sec¬ 
tion 2.3 and the SourceDetect application described in Sec¬ 
tion 2.2 as our illustrative examples. 

To initiate the multi-resolution optimization, ArrayLoop 
initially generates a series of versions. A*, ..., A^, of 

the original iterative array A. Each version has a differ¬ 
ent resolution. A* is the original array. It has the highest 
resolution. A^ is the lowest-resolution array. Figure [^illus¬ 
trates three pixelated versions of an Isst image represented 
as iterative array A° in the context of the SourceDetect ap- 



Figure 9: Illustration of the multi-resolution opti¬ 
mization for the SourceDetect application. There is a 
sequence of three grid operations initiated from the 

. . 1 T . /lO /lO ,2,2) ^ gridp{A^ ,2,2) 

original Isst image A : A -> A - > 

j^2 9’^^dp(A^ ,2,2)^ ^3 


plication. The coarser-grained, pixelated versions are gen¬ 
erated by applying a sequence of grid followed by filter 
operations represented together as gridpO, where p is the 
predicate of the filter operator. The size and the aggregate 
function in the grid operator are application-specific and are 
specified by the user. The SourceDetect application has a 
grid-size of (2 x 2) and an aggregate function count with a 
filter predicate that only passes grid blocks without empty 
cells (in this scenario all the grid blocks with counted). This 
ensures that cells that are identified to be in the same clus¬ 
ter in a coarsened version of the array, remain together in 
finer grained versions of the array as well. In other words, 
the output of the iterative algorithm on the pixelated ver¬ 
sion array A^ should be a valid intermediate step for . 
ArrayLoop runs the iterative function Q on the sequence of 
pixelated arrays in order of increasing resolution. The out¬ 
put of the iterative algorithm after convergence at pixelated 
version A* is transformed into a finer-resolution version us¬ 
ing an xgrid operator (inverse of a grid operator). It is 
then merged with A'~^, the next immediate finer-grained 
version of the iterative array. We represent both operations 
as xgridmO ■ The xgrid operator [^ produces a result ar¬ 
ray by scaling up its input array. Within each dimension, 
the xgrid operator duplicates each cell a specified number 
of times before moving to the next cell. The following il¬ 
lustrates the ordered list of operators called by ArrayLoop 
during multi-resolution optimizations: 


^0 a-ridpO^ gridpO^ gridpj) ^ 


Ai -5> A*i ^0 


(4) 


where A*“ is the output of the iterative algorithm Q on 
pixelated array A\ and A-’~^ is replaced with A^“^ as the 
new input for the iterative computation at pixelated version 

(i - 1). 

By carefully merging the approximate results with the in¬ 
put array at the next finer-grained level, ArrayLoop skips a 
significant amount of computation. 

The K-means clustering algorithm on points in a continu¬ 
ous space is another example application that benefits from 
this optimization. The KMeans application can use an arbi¬ 
trary grid size. It also uses count as the aggregate function 
with a filter predicate that passes grid blocks that have at 
least one non-empty cell. It is easy to observe that in case 
of K-means clustering, AJ“^ is a valid labeling for the next 

























pixelated array A^~^. Basically, K-means clustering on 
produces a set of centroids for the k-means algorithm on 
A^~^ that lead to a faster convergence than a random set of 
initial centroids. 

The advantage of applying the multi-resolution optimiza¬ 
tion goes beyond better query runtime performance. This 
optimization can also help when the original iterative ar¬ 
ray changes, which is described as the following additional 
optimization: 


Input Change Optimization. If ArrayLoop materializes 
the outputs A** for all the pixelated versions of the original 
array A, then there is an interesting optimization in case 
the original iterative array A is modihed. Unlike the Naiad 
system 20 that materializes the entire state at each itera¬ 


tion to skip some computation in case of change in the input 
data, ArrayLoop takes a different strategy. When changes 
in the input occur, ArrayLoop re-generates the pixelated ar¬ 
rays A*s in Equation!^ but only runs the iterative algorithm 
Q for those arrays A®s that have also changed in response 
to the input array change. If A‘ did not change for some i, 

ArrayLoop skips the computation A*’ A**’ 'ik > i and 

uses the materialized result A*® from the previous run to 
produce A]p^. The intuition is that, if there are only a few 
changes in the input array, it is likely that changes are not 
carried over to all the pixelated versions of the array and 
our system reuses some results of the previous run for the 
current computation as well. 


8. EVALUATION 

In this section, we demonstrate the effectiveness of Ar- 
rayLoop’s native iterative processing capabilities including 
the three optimizations on experiments with 1TB of LSST 
images 24 . Because the LSST will only start to produce 
data in 2019, astronomers are testing their analysis pipelines 
with synthetic images that simulate as realistically as possi¬ 
ble what the survey will produce. We use one such synthetic 
dataset. The images take the form of one large 3D array (2D 
images accumulated over time) with almost 44 billion non¬ 
empty cells. The experiments are executed on a 20-machine 
cluster. (Intel(R) Xeon(R) CPU E5-2430L @ 2.00GHz) with 
64GB of memory and Ubuntu 13.04 as the operating sys¬ 
tem. We report performance for two real-scientihc applica¬ 
tions SigmaClip and SourceDetect described in Sections |2.1| 
and |2.2[ respectively. SigmaClip runs on the large 3D array 
and SourceDetect runs on the co-added 2D version of the 
whole dataset. 

8.1 Incremental Iterative Processing 

We hrst demonstrate the effectiveness of our approach 
to bringing incremental processing to the iterative array 
model in the context of the SigmaClip application. Fig¬ 
ure 10(a) shows the total runtime of the algorithm with dif¬ 


ferent execution strategies. As shown, the non-incremental 
“sigma-clipping” algorithm performs almost four times worse 
than any other approach. The manual-incr approach is 
the incr-sigma-clipping function from Section which 
is the manually-written incremental version of the “sigma¬ 
clipping” algorithm. This approach keeps track of all the 
points that are still candidates to be removed at the next 
iteration and discards the rest. By doing so, it touches the 
minimum number of cells from the input dataset at each it- 
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(a) SigmaClip application with different strate¬ 
gies. 
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(b) Total runtime for SigmaClip for different strategies in sec¬ 
onds. 


Figure 10: Runtime of the SigmaClip application 
with and without incremental processing. Constant 
fc = 3 in all the algorithms. 

eration. Although manual-incr performs better than other 
approaches at later stages of the iterative computation, it 
incurs significant overhead during the first few iterations 
due to the extra data points tracking (Lines 25 to 28 in 
incr-sigma-clipping0 function), manual-incr also re¬ 
quires a post-processing phase at the end of the iterative 
computation to return the final result, ef f icient-incr and 
eff icient-incr+storage are the two strategies used by Ar¬ 
rayLoop (ArrayLoop-incr-sigma-clipping function from 
Section]^, eff icient-incr represents ArrayLoop’s query 
rewrite for incremental state management that also lever¬ 
ages our merge operator, eff icient-incr+storage further 
includes the storage manager extensions. Figure [T0(b)[ shows 
the total runtime in each case. ArrayLoop’s efficient ver¬ 
sions of the algorithm are competitive with the manually 
written variant. They even outperform the manual version 
in this application. All the incremental approaches beat the 
non-incremental one by a factor of 4 — 6X. Interestingly, 
our approach to push some incremental computations to the 
storage manager improves eff icient-incr by an extra 25%. 


8.2 Overlap Iterative Processing 

In Section we describe overlap processing as a tech¬ 
nique to support parallel array processing. In the case of 
an iterative computation, the challenge is to keep the over¬ 
lap data up-to-date as the iteration progresses. The solu¬ 
tion is to efhciently shuffle overlap data at each iteration. 
An optimization applicable to many applications is to per¬ 
form mini-iteration processing, where the shuffling hap¬ 
pens only periodically. Figure [1 1(a) [ shows the effectiveness 
of this optimization in the context of the SourceDetect ap¬ 
plication, which requires overlap processing. T1 refers to 
the policy where ArrayLoop shuffles overlap data at each 
iteration, or no mini-iteration processing. As expected 
this approach incurs considerable data shuffling overhead, 
although it converges faster in the SourceDetect applica¬ 
tion (Figure [ll(b)[ ). At the other extreme, we conhgure 
ArrayLoop to only shuffle overlap data after local conver¬ 
gence occurs in all the chunks. Interestingly, this approach 
performs worse than Tl. Although this approach does a 


































SourceDetect: Overlap Processing with Mini-iteration 



T1 T5 TIO converge 


■ Overlap Update Overhead ■ Iterative Computation Overhead 

(a) SourceDetect application: Tl, T5, and TIO 
refer to policies where ArrayLoop shuffles over¬ 
lap data every iteration, every 5 iterations, and 
every 10 iterations, respectively, converge is the 
strategy where ArrayLoop shuffles data only after 
local convergence occurs. 
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(b) Number of major and mini iterations. Major# is the number 
of times that overlap data is reshuffled and Mini# is the total 
number of iterations. 


SourceDetect: Multi-resolution Optimization 



■ no HAO BAl ■A2 HAS .A4 


(a) SourceDetect application: TIO refers to 
the strategy where ArrayLoop shuffles overlap 
data every 10 iterations. AO, Al, A2, A3, and 
A4 are five versions of the same array with dif¬ 
ferent resolutions, where AO is the same reso¬ 
lution as the original array and A4 is the most 
pixelated version. The grid-size is (2x2). 
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(b) Iteration# that converges at each resolution. 


Figure 11: SourceDetect application: Iterative over¬ 
lap processing with mini-iteration optimization. 

minimum number of data shuffling, it suffers from the long 
tail of mini-iterations (Figure [ll(b)[ 94 mini-iterations). T5 
and no are two other approaches, where ArrayLoop shuffles 
data with some constant interval. We find that TIO, which 
shuffles data every ten iterations, is a good choice in this 
application. The optimal interval is likely to be application- 
specific and tuning that value automatically is beyond the 
scope of this paper. The other interesting approach is to in¬ 
struct ArrayLoop to initiate overlap data shuffling when the 
number (or magnitude) of changes between mini-iterations 
is below some threshold. We simply pick a constant number 
to determine the overlap data shuffling interval in the con¬ 
text of the SourceDetect application. More sophisticated 
approaches are left for future study. 


Figure 12: SourceDetect application: Multi¬ 

resolution Optimization. 

In Section we described a potential optimization in case 
of input data changes in the original array. As an initial 
evaluation of the potential of this approach, we modify the 
input data by dropping one image from the large, 3D array. 
This change is consistent with the LSST use-case, where a 
new set of images will be appended to the array every night. 
We observe that the new co-added image only differs in a 
small number of points from the original one. Additionally, 
these changes do not affect the pixelated array Al. This 
gives us the opportunity to re-compute the SourceDetect 
application not from the beginning, but from the pixelated 
version Al. Although the performance gain is not major 
in this scenario, it demonstrates the opportunity for further 
novel optimizations that we leave for future work. 


8.3 Multi-Resolution Optimization 

The multi-resolution optimization is a form of prior¬ 
itized processing. By first processing a low-resolution ap¬ 
proximation of the data, we focus on identifying the over¬ 
all shape of the structures. Further processing of higher- 
resolution (larger) arrays then extracts the more detailed 
outlines of these structures. Figure 12(a) shows the ben¬ 
efits of this approach in the context of the SourceDetect 
application. We generate four lower-resolution versions of 
the source array AO by sequentially calling the gridO op¬ 
erator with a grid-size of (2x2). We operate on these multi¬ 
resolution versions exactly as described in Equation]^ The 
performance results are compared to those of TIO from 
Figure |ll(a)| as we pick the same overlap-processing pol¬ 
icy to operate on each multi-resolution array. Interestingly, 
the multi-resolution optimization cuts runtimes nearly in 
half. Note that most of the saving comes from the fact that 
the algorithm converges much faster in AO compared to its 
counterpart TIO (Figure [l2(b)[ ) thanks to the previous runs 
over arrays Al through A4, where most of the cell-points 
are already labeled with their final cluster values. 


9. RELATED WORK 

Several systems have been developed that support itera¬ 
tive big data analytics |16[ |29[ |35| . Some have explicit 
iterations, others require an external driver to support itera¬ 
tions, but none of them provides native support for iterative 
computation in the context of parallel array processing. 

Twister [^, Daytona [^, and HaLoop extend MapRe¬ 
duce to add a looping construct and preserve state across 
iterations. HaLoop takes advantage of the task scheduler to 
increase local access to the static data. However, our system 
takes advantage of iterative array processing to increase lo¬ 
cal access to the dynamic data as well by applying overlap 
iterative processing. 

Priter is a distributed framework for fast iterative 
computation. The key idea of Priter is to prioritize iter¬ 
ations that ensure fast convergence. In particular, Priter 
gives each data point a priority value to indicate the im¬ 
portance of the update and it enables selecting a subset of 
data rather than all the data to perform updates in each it¬ 
eration. ArrayLoop also supports a form of prioritized pro¬ 
cessing through multi-resolution optimization. ArrayLoop 


















































































initially finds course-grained outlines of the structures on 
the more pixelated versions of the array, and then it refines 
the details on fine-grained versions. 

REX is a parallel shared-nothing query processing 
platform implemented in Java with a focus on supporting 
incremental iterative computations in which changes, in the 
form of deltas, are propagated from iteration to iteration. 
Similar to REX, ArrayLoop supports incremental iterative 
processing. However REX lacks other optimization tech¬ 
niques that we provide. 

A handful of systems exist that support iterative com¬ 
putation with focus on graph algorithms. Pregel is a 
bulk synchronous message passing abstraction where ver¬ 
tices hold states and communicate with neighboring vertices. 
Unlike Pregel, ArrayLoop relieves the synchronization bar¬ 
rier overhead by including mini-iteration steps in the itera¬ 
tive query plan. Unlike ArrayLoop, Pregel does not priori¬ 
tize iterative computation. 

GraphLab 16 develops a programming model for iter¬ 
ative machine learning computations. The GraphLab ab¬ 
straction consists of three main parts; the data graph, the 
dynamic asynchronous computation as update functions, 
and the globally sync operation. Similar to our overlap iter¬ 
ative processing technique, GraphLab has a notion of ghost 
nodes. However, the granularity of computation is per node, 
while ArrayLoop supports overlap iterative processing per 
chunk. Our system also supports prioritization through the 
novel multi-resolution iterative processing. 

Prior work also studies array processing on in-situ data . 
While this work addresses the limitation that array data 
must first be loaded into an array engine before it can be 
analyzed, it does not provide any special support for iter¬ 
ative computation. SciDB and our extensions are designed 
for scenarios where loading times amortize over a sufficiently 
large amount of data analysis. 


10. CONCLUSION 

In this paper, we developed a model for iterative process¬ 
ing in a parallel array engine. We then presented three opti¬ 
mizations to improve the performance of these types of com¬ 
putations: incremental processing, mini-iteration overlap 
processing, and multi-resolution processing. Experiments 
with a 1TB scientific dataset show that our optimizations 
can cut runtimes by 4-6X for incremental processing, 31% 
for overlap processing with mini-iterations, and almost 2X 
for the multi-resolution optimization. Interestingly, the op¬ 
timizations are complementary and can be applied at the 
same time, cutting runtimes to a small fraction of the per¬ 
formance without our approach. 
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